Extracting a PP Attachment Data Set from a German Dependency Treebank Using Topological Fields

نویسندگان

  • Daniël de Kok
  • Corina Dima
  • Jianqiang Ma
  • Erhard W. Hinrichs
چکیده

PP-attachment has traditionally been tackled as a binary classification task where a preposition is attached to the immediately preceding noun or to the main verb. In this paper, we provide an analysis of PP-attachment in German to show that the assumption that prepositions have only two head candidates does not hold. We propose a realistic PP-attachment data set, in which each preposition has multiple head candidates. The data set is extracted automatically from a dependency treebank with topological field annotations. Finally, we show that the task of PP-attachment is substantially more difficult with this realistic data set than with a binary classification data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Bad Is The Problem Of PP-Attachment? A Comparison Of English, German And Swedish

The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...

متن کامل

How bad is the problem of PP-attachment?

The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...

متن کامل

What Treebanks Can Do For You: Rule-based and Machine-learning Approaches to Anaphora Resolution in German

This paper compares two approaches to computational anaphora resolution for German: (i) an adaption of the rule-based RAP algorithm that was originally developed for English by Lappin and Leass, and (ii) a hybrid system for anaphora resolution that combines a rule-based pre-filtering component with a memory-based resolution module. The data source is provided by the TüBa-D/Z treebank of Ger-man...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Serialising the ISO SynAF Syntactic Object Model

This paper introduces , an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017